[00:01.370 --> 00:06.380]  So real quick about me. I have a bit of a reputation as kind of approaching this as an academic
[00:06.380 --> 00:11.460]  frameset, simply because I got started in the whole password cracking thing
[00:12.020 --> 00:19.440]  through my research when I was getting my PhD. But I really do strongly believe in learning by
[00:19.440 --> 00:25.640]  doing. So I'm an active member in Team John Ripper, and I do participate in password cracking
[00:25.640 --> 00:29.920]  contests like, you know, Crack Me If You Can, that's going on right now. Luckily, this talk is,
[00:29.920 --> 00:34.800]  you know, being filmed, you know, before the contest starts. So no spoilers here,
[00:34.800 --> 00:40.160]  unfortunately. But good luck to everyone else who's participating. So password cracking,
[00:40.160 --> 00:46.920]  really, it's my hobby and a little bit of an obsession. But it's not my day job, unfortunately.
[00:47.120 --> 00:53.320]  But my day job has been very exciting recently, though, because I really focus on medical device
[00:53.320 --> 01:00.720]  security. And you can imagine with all the craziness around COVID-19, it's been an interesting
[01:00.720 --> 01:06.800]  time. So one project that I really kind of want to highlight is the Open Ventilator Monitoring
[01:06.800 --> 01:12.120]  and Alerting Project that I've been helping to contribute to. And there's actually a talk at the
[01:12.120 --> 01:17.340]  Biohacking Village this Sunday that I really highly recommend people go ahead and listen to,
[01:17.340 --> 01:22.180]  so that our team members are giving it. And it really is going to talk to you about, you know,
[01:22.180 --> 01:27.160]  the lessons learned and how other people can help contribute as well. Because this has been a big
[01:27.160 --> 01:32.820]  problem because, as I'm sure you're aware, there's been a huge demand for ventilators to be able to
[01:32.820 --> 01:38.200]  help deal with COVID-19. So there's been a lot of different projects that have kind of stood up
[01:38.200 --> 01:44.500]  to try to help produce, you know, low-cost ventilators to help fill that need pretty
[01:44.500 --> 01:50.140]  quickly there. So rather than have every single do-it-yourself ventilator develop their kind of
[01:50.140 --> 01:54.760]  own monitoring and alerting framework, we're trying to produce one common one that can be
[01:54.760 --> 02:00.700]  applied to all these different projects across the board. So because when you have these ventilators
[02:00.700 --> 02:04.800]  being able to treating, you know, patients, the patients are highly infectious, so you don't want
[02:04.800 --> 02:11.040]  to have the nurses exposed to that. But if something goes wrong, you need, you know, seconds
[02:11.040 --> 02:15.240]  count. So you need to be able to forward all that sensing information that these ventilators are
[02:15.240 --> 02:19.380]  doing back to a centralized nursing workstation. And you need to do that securely because you're
[02:19.380 --> 02:27.300]  not just on a real hospital network. So that's been a really fulfilling project that I've been
[02:27.300 --> 02:32.580]  working on. So another thing I'm kind of helping out with here, as I move my head, is I'm helping
[02:32.580 --> 02:38.400]  out run the Defcon Biohacking Village's Capture the Flag contest. So this was originally supposed
[02:38.400 --> 02:42.780]  to be in Vegas. That changed, of course. So now a lot of medical equipment is actually sitting in
[02:42.780 --> 02:48.060]  my house. So I have to be able to provide a way for hackers from all over the world to be able
[02:48.060 --> 02:52.600]  to log in and hack these infusion pumps here without also hacking my smart thermostat.
[02:53.120 --> 02:57.180]  As part of that, I actually had to repurpose one of my password cracking rigs, as you can see there,
[02:57.180 --> 03:02.960]  in order to run all the VMs that are helping to keep people, you know, on those ventilators
[03:02.960 --> 03:11.880]  and hacking those and not, you know, hacking my smart thermostat. So probably the first
[03:11.880 --> 03:19.120]  questions I should start kind of addressing here is, you know, what does that PCFG stand for in,
[03:21.200 --> 03:27.400]  so originally, and I guess still technically, it stands for Probabilistic Context-Free Grammar,
[03:27.400 --> 03:31.580]  which is the kind of the modeling framework it uses in order to model how people create passwords.
[03:31.740 --> 03:36.140]  So if you're into, you know, the Siri Autonoma or, you know, formal languages, this might actually
[03:36.140 --> 03:40.380]  mean something to you. But for most people, you know, they hear that and they're like, oh god,
[03:40.380 --> 03:43.860]  that's like mass and stuff like that. I mean, there's no way it's going to run on my computer.
[03:44.420 --> 03:50.060]  And then he's kind of like slowly walk away. So I decided in order, I need it need to have a more
[03:50.060 --> 03:57.020]  descriptive name. So I went ahead and rebranded it the Pretty Cool Fuzzy Guesser. So, and this
[03:57.020 --> 04:00.280]  kind of explains it a little bit better about what it's actually doing underneath the hood,
[04:00.280 --> 04:04.600]  because you train this on a list of passwords. And then it'll go ahead and create guesses that
[04:04.600 --> 04:09.560]  are similar to those passwords, but different, which is really kind of important in order to
[04:09.560 --> 04:14.520]  help, you know, expand your cracking session. So I don't this is my favorite slide I've ever made.
[04:14.520 --> 04:22.000]  So it's all downhill from here. So really kind of what it's doing is it's using machine learning
[04:22.000 --> 04:26.280]  in order to crack passwords. And when I say machine learning, I mean, in the traditional
[04:26.280 --> 04:30.380]  sense of a whole bunch of if then statements. So it's not using neural networks or artificial
[04:30.380 --> 04:35.060]  intelligence. But you are training on passwords that you expect to be somewhat similar to the
[04:35.060 --> 04:40.740]  target passwords that you're cracking. And when it processes that training password set,
[04:40.740 --> 04:45.660]  it extracts all sorts of probability information about the components of those passwords that
[04:45.660 --> 04:50.000]  it finds there. So it figures out things like, you know, capitalization masks, whether numbers
[04:50.000 --> 04:54.080]  go at the beginning of the password versus the end, the probability of individual letters and
[04:54.080 --> 04:59.940]  numbers found in that password, keyboard walks, and so on. And so it goes ahead and creates a
[04:59.940 --> 05:03.500]  model based upon all those different types of probability information there. And then it uses
[05:03.500 --> 05:08.160]  those in order to generate very highly probable password guesses in probability order. So it'll
[05:08.160 --> 05:11.220]  start with the most probable password guess and go to the second most probable password guess and
[05:11.220 --> 05:15.420]  then go to the third one and so on until you crack the password that you're trying to find or
[05:15.420 --> 05:26.590]  you give up. So let me just move my thing here a little bit here. So just to kind of tie us back
[05:26.590 --> 05:31.570]  into probably what's going on right now, as I said, I don't know what the actual contest is going to
[05:31.570 --> 05:37.810]  be like. Forward or crack me if you can. But CoreLogic helpfully provided a brief summary of, you know,
[05:37.810 --> 05:42.890]  what the scenario is going to be at least here. And so we're going to be targeting 12 different
[05:42.890 --> 05:47.610]  individuals. And those individuals change their passwords over time in order to be able to deal
[05:47.610 --> 05:52.010]  with more complex password creation requirements. And that sounds a little bit something like
[05:52.010 --> 05:55.850]  something that, you know, a PCFG might actually be useful for. So I'm really optimistic for this
[05:55.850 --> 06:00.930]  contest. You'll see how, you know, optimistic I am on Saturday when I'm actually, you know,
[06:00.930 --> 06:07.250]  we're giving this talk. But, you know, this is kind of the scenario that this was originally
[06:07.250 --> 06:13.290]  developed for in the first place of, you know, you know how a subject creates passwords. So you
[06:13.290 --> 06:16.490]  want to create passwords kind of similar to that. But you also want to go ahead and change them and
[06:16.490 --> 06:22.570]  maybe, for example, you use more complex rules or, you know, complex password creation requirements
[06:22.570 --> 06:28.590]  added on top of that there. So I'm available probably on Discord right now. And so I'll be
[06:28.590 --> 06:31.670]  able to answer questions about how potentially you might be able to tweak this in order to
[06:31.670 --> 06:39.020]  help in a scenario like this here. The fact that there's a lot of academic papers about this,
[06:39.020 --> 06:42.880]  though, is when I give a talk like this, I don't actually have to create any of my own graphs. I
[06:42.880 --> 06:46.440]  just can go to other papers there, you know, look at the research that other people have done,
[06:46.440 --> 06:49.020]  and just pull out their graphs in order to be able to talk with it here.
[06:50.480 --> 06:53.400]  So one thing that I really kind of want to highlight, though, and you need to kind of
[06:53.400 --> 06:57.960]  look at this with a bit of a skeptical eye, is that you'll notice that all these cracking
[06:57.960 --> 07:03.320]  sessions are really, you know, short. I know, you know, one, you know, trillion guesses here,
[07:03.320 --> 07:07.260]  that might sound like a lot. But when you start talking about, you know, GPU password cracking,
[07:07.260 --> 07:11.740]  you're talking about like under a second in order to generate all those. So that's just, you know,
[07:11.740 --> 07:18.620]  no time whatsoever. And part of reason for this here is that the PCFG approach, it's very slow,
[07:18.620 --> 07:24.900]  it doesn't, you know, scale very well with multi-threading currently. So when you start
[07:24.900 --> 07:28.280]  talking about, you know, passwords that you want to be able to crack, it works very well
[07:28.280 --> 07:31.780]  when you're going ahead targeting very slow password hashes, where you can only make, you
[07:31.780 --> 07:35.940]  know, you know, thousands of guesses a second, because the hash is very slow. But we start
[07:35.940 --> 07:40.060]  talking about things like, you know, unsalted MD5, other attacks that are going to be much
[07:40.060 --> 07:44.160]  more effective, because you can just make so many more guesses in the same time frame there.
[07:44.640 --> 07:49.640]  So when you start talking, looking at, you know, faster password hashes, you can certainly go
[07:49.640 --> 07:54.080]  ahead and still use a PCFG to supplement your attack. And you can still go ahead and crack
[07:54.080 --> 07:59.980]  some passwords that you might not normally get. But in general, for the faster password hashes,
[07:59.980 --> 08:02.960]  you really are going to want to go ahead and use more traditional types of password cracking
[08:02.960 --> 08:06.740]  attacks in order to really make use of the hardware that you have available to you.
[08:09.940 --> 08:15.900]  So I want to talk about this graph kind of, though, and really focus on it, because this
[08:15.900 --> 08:21.120]  was a really neat study done by Carnegie Mellon University. And one of the problems when you look
[08:21.120 --> 08:23.700]  do academic research, especially when you start talking about offensive
[08:25.360 --> 08:32.880]  tactics, is that the academics are, you know, running attacks themselves. So you're looking at
[08:32.880 --> 08:37.720]  the, you know, the how effective students are at cracking passwords versus someone professional,
[08:37.720 --> 08:43.840]  potentially. So CMU, you know, took the probably the, you know, the most straightforward approach
[08:43.840 --> 08:48.040]  to be able to solve that problem was they went out and reached out to CoreLogic. You might have
[08:48.040 --> 08:51.000]  heard of them. They're running this, you know, password village. They run the Crack Me If You
[08:51.000 --> 08:55.160]  Can competition. So when you're trying to find an expert, you know, they're like, you know,
[08:55.160 --> 09:00.220]  way up there. So they're a pretty good representation for that there. So what they
[09:00.220 --> 09:05.380]  did was they gave, you know, one of the CoreLogic engineers a password list. They asked them to
[09:05.380 --> 09:10.720]  crack it. And they recorded, you know, how many passwords they cracked over time. It was the
[09:10.720 --> 09:14.060]  number of guesses they made. And then they compared it against other cracking sessions as well.
[09:14.920 --> 09:18.500]  And, you know, one thing I'm really, you know, it makes me smile every time I see this here
[09:18.500 --> 09:25.480]  is that the PCFG did really well compared to the pros, which was CoreLogic,
[09:25.480 --> 09:30.620]  for that short cracking session. So when you start asking, like, you know, can this represent,
[09:31.040 --> 09:37.660]  you know, how a real professional password cracker operates, the short answer is it
[09:37.660 --> 09:42.220]  certainly, you know, appears to be able to be able to do that there. So, you know,
[09:42.220 --> 09:47.460]  full disclaimer, when you gave, you know, CoreLogic more time, they definitely performed
[09:47.460 --> 09:52.020]  way better. This is a logarithmic graph, so that's about 100 times more guesses.
[09:52.700 --> 09:58.360]  And also, I'll admit this wasn't fair to CoreLogic either, because, you know, that's not typically
[09:58.360 --> 10:04.660]  how, you know, people crack passwords in real life there. It was such a short, you know,
[10:04.660 --> 10:09.800]  cracking session. And when it is that short, usually it's against a really strong password
[10:09.800 --> 10:13.940]  and you have a lot of time in order to really manually tweak your attack that you're running
[10:13.940 --> 10:20.640]  there. That being said, if any of you are listening, I would love to have a repeat or a
[10:20.640 --> 10:28.220]  rematch of this, you know, attack just to see, you know, how this performs with all the new improvements
[10:28.220 --> 10:32.180]  that have been made into PCFGs. And I'm sure that, you know, CoreLogic has really been upping
[10:32.180 --> 10:43.900]  their game over the years as well. So that's why I'm somewhat hopeful that we'll be able to find it,
[10:43.900 --> 10:51.130]  now. So enough about, you know, all the research side of that there. Let's talk about how to
[10:51.130 --> 10:56.250]  actually make use of this PCFG password cracker. So the first thing is you just go ahead and
[10:56.250 --> 11:01.890]  download it from the GitHub repo. And the requirements of it, I really have strived to
[11:01.890 --> 11:08.510]  make it as simple as possible. So you need to have Python 3, and that's it. So there's an optional
[11:09.190 --> 11:16.210]  Karadet Python module that can help during the training. And that's because it helps detect
[11:16.210 --> 11:19.970]  what character encoding the training set is, because character encoding is the bane of my
[11:19.970 --> 11:24.910]  password cracking existence. But even that's optional. And actually, it's now being installed
[11:24.910 --> 11:30.570]  as part of PIP 3. So if you have PIP, you probably don't even need to install it yourself as well.
[11:30.570 --> 11:34.350]  And this is really useful, though, because I find a lot of situations where, like, when I'm cracking
[11:34.350 --> 11:38.410]  passwords, like, I don't have, like, internet access. So it's really nice to be able to go
[11:38.410 --> 11:43.510]  ahead and quickly throw my tool on a box and get it to run. So if you can run Python 3 on a box,
[11:43.510 --> 11:47.850]  you can probably run this here. So I've tried it on a bunch of different OSs. I've actually even
[11:47.850 --> 11:52.470]  gotten it to run on NetBSD. And it was the only thing I've ever gotten to be able to run NetBSD.
[11:53.310 --> 11:57.630]  So hopefully this is easier than your typical academics tool set in order to get it installed
[11:57.630 --> 12:01.230]  and start cracking passwords with as quickly as you can.
[12:02.710 --> 12:06.930]  So we start talking about hardware requirements, because that's always an important portion when
[12:06.930 --> 12:13.590]  we start talking about password cracking. The PCFG tool set, it is single-threaded CPU bound,
[12:13.590 --> 12:20.370]  which is why it's so awfully slow. But it will use an entire CPU thread. So you do really need
[12:20.370 --> 12:28.070]  to dedicate one full CPU, you know, thread to the PCFG. The other thing is it has very high RAM
[12:28.070 --> 12:33.470]  usage. It basically maintains a lot of different data structures and memory. And those data
[12:33.470 --> 12:38.150]  structures become more complex over time. So it just grows. So I could have done some things,
[12:38.150 --> 12:45.590]  tried to go and prune that or, you know, move some of it to disk. But RAM is cheap. So I haven't.
[12:45.630 --> 12:50.570]  So it'll just keep on growing over time. So initially, it starts out pretty low usage.
[12:50.650 --> 12:53.630]  But if you're talking about running this, you know, password cracking session for like a week
[12:53.630 --> 12:57.990]  or two, you really need to have at least 16 gigabits of RAM to really kind of just fully
[12:57.990 --> 13:08.300]  dedicate to the PCFG tool set itself. So the next step is to actually make use of it and run it.
[13:08.800 --> 13:14.200]  So I apologize up front that I tend to use the words rule set and grammar interchangeably.
[13:14.200 --> 13:19.580]  At least to me, they mean the exact same thing. But really what I'm talking about is that,
[13:19.580 --> 13:23.380]  you know, I mentioned, you know, machine learning a couple of times here, you have to go ahead and
[13:23.380 --> 13:29.320]  train a grammar on existing password data set. Now, you may want to have, you know,
[13:29.320 --> 13:33.280]  you know, different grammars for different targets that you're trying to target. So if you're trying
[13:33.280 --> 13:37.980]  to target, you know, a web application catering to younger people, you might want to train it on
[13:37.980 --> 13:41.420]  passwords that resemble that. If you're trying to target, you know, corporate passwords, you might
[13:41.420 --> 13:46.280]  want to train on corporate passwords instead, and use those rule sets against target passwords that
[13:46.280 --> 13:50.480]  you're specifically you think will match that. So you can have as many rule sets as you want
[13:50.480 --> 13:54.180]  to be able to really kind of fine tune your cracking session there.
[13:54.520 --> 14:00.480]  So the default one rule set that comes with the PCFG password cracker was actually trained on
[14:01.140 --> 14:05.120]  a subset of 1 million passwords from the Rocky data set, which came out in 2008. And it was
[14:05.120 --> 14:08.700]  against web passwords. So there wasn't really any strong password requirement whatsoever.
[14:08.780 --> 14:14.540]  I've been thinking about updating that. So if you have a good data set that you think that
[14:14.540 --> 14:18.720]  I should use for that there, I'm open to hearing about that there to make it a little bit more
[14:18.720 --> 14:24.300]  effective. But that being said, the Rocky data set is still extremely effective even to this day.
[14:24.940 --> 14:31.540]  It's just, you know, Blink-182 is not nearly as popular. So after you choose the data set that
[14:31.540 --> 14:36.240]  you want to use, though, now you go ahead and start generating guesses. So it's a Python program.
[14:36.240 --> 14:44.780]  So you just run, you know, Python 3. You run the PCFG guesser.py tool from the repo.
[14:45.420 --> 14:49.240]  You get it in the ender rule set. By default, it's default. So if you don't go ahead and
[14:49.920 --> 14:53.960]  specify that there, it'll go ahead and use the Rocky data set. And then you go ahead and specify
[14:53.960 --> 15:00.820]  session name as well. By default, this is default as well. And so the session is used to restart a
[15:00.820 --> 15:04.460]  password cracking session. So if you have to cancel it for whatever reason, you can go ahead
[15:04.460 --> 15:13.700]  and restart it back up again. So I really want to kind of highlight, though, that the PCFG tool set
[15:13.700 --> 15:18.240]  is only a password guess generator tool set. It will generate password guesses. It'll generate
[15:18.240 --> 15:21.660]  those password guesses in probability order. So it'll start with the most probable password guess,
[15:21.660 --> 15:25.260]  second most probable password guess, and keep on going down the line. It will not actually
[15:25.260 --> 15:30.740]  hash and crack any passwords. So you need to use another password cracking tool set for that there.
[15:31.260 --> 15:35.080]  You know, both JohnDeRipper and Hashcat work are basically any other password cracking tool
[15:35.080 --> 15:39.740]  that accepts, you know, guesses in from, you know, the standard input there. As I mentioned earlier,
[15:39.740 --> 15:43.520]  I'm on team JohnDeRipper. So I'm going to go ahead and use JohnDeRipper for pretty much all
[15:43.520 --> 15:49.960]  my examples here. But you can totally use Hashcat as well. So in order to do this here, you run the,
[15:49.960 --> 15:54.600]  you know, the previous command that, you know, I talked about. And then you run, you know, pipe it
[15:54.600 --> 15:59.960]  into, for example, John. And on John, they have, you know, a command called standard in, so that
[15:59.960 --> 16:04.440]  you type that in there. And instead of running data from like a word list, or generate your
[16:04.440 --> 16:08.820]  password guesses, it'll go ahead and use the password guesses that are piped into it instead.
[16:09.780 --> 16:12.900]  And you're cracking passwords. That's really all there is to it.
[16:13.980 --> 16:17.820]  So there's definitely optimizations for actually using this in the real world, though.
[16:17.820 --> 16:21.620]  So the first thing I really kind of want to highlight is a lot of times you want to know
[16:21.620 --> 16:26.420]  what the status of a cracking session is. So the challenge when you are using the pipe command,
[16:26.420 --> 16:29.660]  though, is if you go ahead and hit the enter button on your keyboard, instead of sending the
[16:29.660 --> 16:34.360]  enter button to JohnDeRipper, it's going to go ahead and forward that to my tool instead.
[16:34.720 --> 16:39.160]  So you might want to be able to, you know, get JohnDeRipper to output a status report.
[16:39.160 --> 16:43.280]  The way that you do that is you send a siguser1 signal to JohnDeRipper.
[16:43.440 --> 16:49.200]  If you're writing this on a Linux system here, you just type in kill-siguser1 and then the
[16:49.200 --> 16:53.020]  process identifier JohnDeRipper. And when you do that and hit enter, it'll be just like hitting
[16:53.020 --> 16:57.020]  enter on the JohnDeRipper itself and it'll go ahead and output the status output of its
[16:57.020 --> 17:02.400]  current cracking session. So now you can do things like, okay, not only see the passwords
[17:02.400 --> 17:05.960]  are getting cracked, but you can see like the number of hashes, the total number of hashes
[17:05.960 --> 17:09.980]  are cracked so far. You can see like, for example, the guessing speed. So in this case,
[17:09.980 --> 17:13.500]  it's making about 4 million guesses a second. And then you can see like how long it's been
[17:13.500 --> 17:20.500]  running and, you know, all the other, you know, options as well. So I want to kind of dig into
[17:20.500 --> 17:25.840]  that one, you know, output of that cracking session though, because I think, you know,
[17:25.840 --> 17:31.140]  this really kind of helps demonstrate kind of some of the power of using the PCFG.
[17:31.760 --> 17:35.980]  Because normally when you're, this here is showing the passwords as they get cracked.
[17:37.500 --> 17:41.840]  So you can kind of see that it's not just going ahead and, you know, figuring out one rule and
[17:41.840 --> 17:44.860]  then exhausting that rule and then going to the next rule like you would see in the more
[17:44.860 --> 17:49.380]  traditional password cracking session. Instead, it's creating much more fine grained rules
[17:49.380 --> 17:53.680]  and iterating between all those depending on what the current probability of it is here.
[17:53.740 --> 17:56.360]  So when you see these passwords being cracked, it's kind of fun to try to figure out like
[17:56.360 --> 18:01.680]  how did the underlying system, you know, generate that password guess? Why is it making the guess
[18:01.680 --> 18:07.640]  right now that that it is? So kind of if you look up, you know, initially here,
[18:07.640 --> 18:11.020]  like this is pretty easy. Okay, it's just taking like some five letter word.
[18:22.900 --> 18:28.000]  I apologize, my microphone just died here. So fun doing, you know, DEF CON remote.
[18:29.520 --> 18:34.480]  So, you know, it's using five digits, you know, or five letter words plus four digits here.
[18:35.980 --> 18:39.240]  Moving on, though, this is a kind of an interesting one here is SES is cool.
[18:39.320 --> 18:43.180]  So I looked into my input dictionary and SES is cool was not in my input dictionary or my train
[18:43.180 --> 18:47.640]  set at all. And I found out that it was actually you doing multi words for this year. So it was
[18:47.640 --> 18:52.560]  combining, you know, SES and then it's cool. So one kind of cool thing about this, and I'll talk
[18:52.560 --> 18:56.640]  to us about this a little bit later, is that instead of, you know, going ahead and breaking
[18:56.640 --> 18:59.920]  this up into three words, like we normally would think about it there, it actually broke it up into
[18:59.920 --> 19:03.620]  two different words. So SES is cool. So that way you can go ahead and go through it and say like,
[19:03.620 --> 19:10.140]  is, you know, Katie cool? Is, you know, Allie cool? Is Bob cool? Because there's a lot of cool
[19:10.140 --> 19:15.280]  people out there. So it can go ahead and iterate through those there and try that type of, you
[19:15.280 --> 19:22.940]  know, mangling rule for it there. And what's really cool about this is that it learned that
[19:22.940 --> 19:29.260]  is cool is a common word from the train set itself. So I didn't actually ever program in
[19:29.260 --> 19:35.540]  that logic into it. It learned it by itself by looking at the training data, which is,
[19:36.120 --> 19:43.320]  as I said, pretty cool. But you can see after that there, it kind of went into brute force.
[19:43.320 --> 19:46.300]  It wasn't the pure brute force, and I'll talk about the different types of brute force here,
[19:46.300 --> 19:49.380]  is actually combining some very short words, kind of like a combinator attack.
[19:49.520 --> 19:55.680]  But still, you know, it's able to kind of get that out that way. And then it went ahead and
[19:55.680 --> 19:59.660]  tried, you know, words with, you know, special characters, the same special characters at the
[19:59.660 --> 20:04.640]  beginning and the end of them too. And that's, you might be able to see that in a traditional
[20:04.640 --> 20:07.620]  password cracking session, but you actually have to have a rule in order to generate that. And
[20:07.620 --> 20:12.360]  trying to create those rules is a real pain. So you won't see those in, you know, most, you know,
[20:12.360 --> 20:15.720]  common publicly available rule sets, but it was able to learn that from the training data, which
[20:15.720 --> 20:22.880]  was, I thought, pretty cool as well. So down here, and I'm kind of, you know, need to get off
[20:22.880 --> 20:27.760]  the screen here, but you can see it's trying some longer words. But these are actually, while they're
[20:27.760 --> 20:33.920]  normal words here, it's actually generated them via the multi-words as well. So like finger plus
[20:33.920 --> 20:38.460]  nail, or 90 plus 9. And once again, this is kind of really useful because now you don't have to
[20:38.460 --> 20:43.780]  have things like 99, 98, 93, you know, in your data set, your word list as well, because it's
[20:43.780 --> 20:49.680]  generating those on the fly. Kind of going down a little bit further here, this is your more
[20:49.680 --> 20:53.620]  traditional kind of rule here. It's just two digits plus capitalizing the first letter of a word.
[20:53.780 --> 20:56.940]  So you can see it's starting to do that. But, you know, settings is a pretty uncommon password,
[20:56.940 --> 21:00.980]  word to be able to use. So it's trying it later in a cracking session here.
[21:01.740 --> 21:05.640]  And now it's even combining even more mangling rules. So it's trying, doing a multi-word,
[21:05.640 --> 21:10.920]  you know, of wood plus fish and, you know, Tara plus Don, and adding digits to the end of that
[21:10.920 --> 21:15.420]  as well. So you can see how it starts stacking these different rules together. And I kind of
[21:15.420 --> 21:18.500]  want to highlight that this cracking session has been going on, as you can kind of see from the
[21:18.500 --> 21:23.860]  status output, for about 13 minutes. So all the really easy passwords have already been cracked.
[21:23.860 --> 21:27.580]  It's already guessed, you know, 1, 2, 3, 4, 5, 6, and password 1, 2, 3, 4, 5, and so on.
[21:28.160 --> 21:32.420]  So these are starting to get into more of the, you know, the fuzzier of the rule sets that you
[21:32.420 --> 21:41.170]  might not normally see in a normal password cracking session. So as I mentioned a little
[21:41.170 --> 21:44.890]  bit earlier, if you hit the enter button, it's going to go to my program, not, you know, John
[21:44.890 --> 21:49.470]  the Ripper. But there's a lot of information that I want to be able to provide to people about the
[21:49.470 --> 21:56.330]  cracking session. So whenever you hit enter, or basically any of these R keys here, it'll display
[21:56.330 --> 22:00.370]  an output of what it's currently doing. So you can try to figure out, you know, whether you want to
[22:00.370 --> 22:03.630]  continue it, whether it's working correctly, and whether it's kind of doing what you want it to do
[22:03.630 --> 22:10.770]  as well. So kind of going through this here, I hit enter twice. And so you can kind of see how
[22:10.770 --> 22:15.930]  it's generating, you know, these password guesses as doing it here. So the first one, you know,
[22:15.930 --> 22:19.890]  it's, you know, basically going ahead and trying to combine two words. So it's a multi-word type
[22:19.890 --> 22:26.030]  of attack. And you can, if you dig into like the real details of it there, you can kind of see
[22:26.030 --> 22:31.710]  that it's trying like the hundred and forty-third most probable word with no capitalization. And
[22:31.710 --> 22:35.550]  it's combining it to the ninety-third most probable four-letter word with no capitalization
[22:35.550 --> 22:39.590]  as well. So you can see that the probabilities it assigns to like even individual words and
[22:39.590 --> 22:45.070]  stuff like that is very, very fine grained. So it's going to try some words, and then like do
[22:45.070 --> 22:49.290]  other mangling rules and stuff like that. And they'll go back to the less probable words later
[22:49.290 --> 22:56.150]  on in the cracking session. So now this next one here, it's kind of a little bit, I'll try to get
[22:56.150 --> 23:00.190]  out of the way or something like that. You can see that it switched to a real brute force attack using
[23:00.190 --> 23:06.430]  OMEN, ordered Markov enumerators. And I'll talk about that a little bit later there.
[23:06.690 --> 23:10.510]  But really, I kind of want to highlight though that it's trying, you know, more traditional cracking
[23:10.510 --> 23:15.570]  rules. So it's like, you know, combining words. And then it's switching to brute force. And then
[23:15.570 --> 23:19.130]  it'll switch to another mangling rule after this here. And then it'll just keep on going
[23:19.130 --> 23:26.290]  based upon whatever the current probability is. Now, as I said, I really struggle with documenting
[23:26.290 --> 23:32.190]  my code. So I try to go ahead and add as much documentation into the runtime behavior of it
[23:32.190 --> 23:37.350]  as possible. So if instead of hitting, you know, enter or anything else along those lines, you hit
[23:37.350 --> 23:45.050]  H and hit enter, it'll provide, or just H actually, it'll provide a SAS report output of what all
[23:45.050 --> 23:50.450]  these different fields here mean. And that SAS report actually is much longer than even, you
[23:50.450 --> 23:55.190]  know, displayed on the screen here. But it explains what all those different, like, letters like A5 or
[23:55.190 --> 24:01.410]  C5 actually stand for. The one thing that I kind of really want to highlight though is this one
[24:01.410 --> 24:08.890]  metric here called probability coverage. Because since the PCFG password cracker creates guesses
[24:08.890 --> 24:12.710]  in probability order, it starts with the very high probable passwords, and it goes to less
[24:12.710 --> 24:18.310]  probable passwords and less probable passwords. And the model that it has will basically never
[24:18.310 --> 24:23.010]  finish. It'll just keep on figuring new combinations of words to go through to it.
[24:23.010 --> 24:28.470]  So a real challenge becomes, you know, when do you go ahead and give up on a cracking session?
[24:28.470 --> 24:34.130]  So you haven't cracked the password. When should you go ahead and, you know, kill this off and try
[24:34.130 --> 24:38.610]  some other cracking type of attack that might be more successful? Or when do you go ahead and just
[24:38.610 --> 24:42.870]  choose to say, I'm not going to crack this password and move to a different case? So this probability
[24:42.870 --> 24:47.850]  coverage is a very fuzzy metric that I tried to develop to try to just give you a little bit of
[24:47.850 --> 24:56.050]  kind of a rule some about when that should be. So what this metric says is that if the target
[24:56.050 --> 25:02.510]  password is the same probability distribution as the password data set that I trained, and if my
[25:02.510 --> 25:08.030]  grammar and how to model how these passwords are created was exactly correct, this is the
[25:08.030 --> 25:12.050]  probability that we cracked this password. Now, neither one of those assumptions is actually true
[25:12.050 --> 25:14.870]  in real life. You know, the probability model of the password you're trying to crack is probably
[25:14.870 --> 25:21.090]  very different. You know, the grammar that I generate and train on is absolutely not perfect.
[25:21.290 --> 25:24.550]  But at least, as I said, it kind of gives you a rule of thumb to say, okay, you know, this is
[25:24.550 --> 25:28.770]  starting to get a little bit high. It says, you know, I had like a 90% chance to crack this password.
[25:28.770 --> 25:33.310]  I haven't cracked this password yet. Maybe I should go ahead and give up. And you'll notice
[25:33.310 --> 25:36.730]  this number jumps up really high initially because it's making, you know, high probability
[25:36.730 --> 25:42.010]  password guesses. And then it slows to a crawl to almost like no advancing after you get to,
[25:42.010 --> 25:47.650]  you know, like 70 or 80, 90 percent, you know, completion there. So this is kind of really good
[25:47.650 --> 25:51.170]  to be able to figure out, you know, where can I go ahead and devote that, you know, that one single
[25:51.170 --> 25:58.740]  CPU and that RAM to somewhere else there. So another usage tip I just kind of want to highlight
[25:58.740 --> 26:03.780]  is that sometimes the cracking dynamic when it comes to speed is completely reversed.
[26:04.220 --> 26:08.860]  So you might be trying to crack very, very computationally password hashes, expensive
[26:08.860 --> 26:13.320]  password hashes, or a lot of like, let's say, assaulted hashes, in which case you're really
[26:13.320 --> 26:17.100]  only making, you know, a couple guesses a second. Well, this generator is generating, you know,
[26:17.100 --> 26:21.640]  let's say, you know, between like, you know, 100,000 and like 4 million guesses a second.
[26:21.640 --> 26:27.320]  So it gets backlogged. And basically, essentially freezes while it waits for to be able to send more
[26:27.320 --> 26:32.360]  guesses to the password cracking program. So occasionally, if you hit enter, it won't actually
[26:32.360 --> 26:36.560]  display set the status or it'll take a while to display the status. And that's kind of usually
[26:36.560 --> 26:41.060]  what's happening. So if that's happening, and you're kind of curious whether the password
[26:41.060 --> 26:45.800]  cracking session has crashed or not, I recommend going back to earlier advice about sending a
[26:45.800 --> 26:49.300]  signal to, let's say, John Ripper, and just seeing how that's doing there
[26:50.320 --> 26:53.620]  in order to make sure that your password cracking session is still running.
[26:56.680 --> 27:01.620]  So, as I talked about, you know, multi-word feature has probably been the biggest,
[27:01.620 --> 27:10.140]  you know, addition to the new 4.0 rewrite. And it has completely shocked me how effective this
[27:10.140 --> 27:15.420]  has been here. So I won't get talk too much details about it. But the one thing I want to
[27:15.420 --> 27:21.060]  really kind of stress, though, is that it is not language specific at all. It learns all what
[27:21.060 --> 27:25.980]  constitutes a word from the training set that you're giving it there. So it'll pick up things
[27:25.980 --> 27:29.920]  like new band names or proper nouns that are really hard to specify inside the language
[27:29.920 --> 27:36.000]  dictionary or whatever new Pokemon just came out. And identifies patterns like, you know, I love,
[27:36.000 --> 27:43.040]  and stuff like that. So this is very useful for being able to, you know, you know,
[27:43.040 --> 27:51.740]  try to target new, you know, password hashes. So, as I said, it's not language specific.
[27:51.740 --> 27:58.200]  It works best with, I would say, kind of like European English type languages.
[27:58.880 --> 28:04.980]  It really struggles still with some of the other languages like Mandarin. But that is something
[28:04.980 --> 28:10.260]  absolutely that I really want to focus on more going forward here. It's not perfect. It's
[28:10.260 --> 28:14.260]  definitely a work in progress. So there is a balance between, you know, creating, you know,
[28:14.260 --> 28:19.180]  false positives and matches here. If you don't see some of the base words in the train set by
[28:19.180 --> 28:25.000]  themselves, it won't identify them. But it's something that is evolving. And part of the new
[28:25.540 --> 28:31.360]  pull request that I just received from somebody else actually has some improvements to this here
[28:31.360 --> 28:39.000]  that I'm really excited about, again, pushing to Maine. So one of the other big features that have
[28:39.000 --> 28:47.680]  been added recently here is ordered Markov enumerator. And the whole reason why I talked
[28:47.680 --> 28:52.040]  about this is that a similar approach can be taken for pretty much anything. So if someone
[28:52.040 --> 28:58.860]  creates a better cracking attack or cracking mode, it can totally be incorporated into a PCFG style
[28:58.860 --> 29:05.400]  attack. I'll be a little bit like the Borg in that respect. But the real challenge is to be
[29:05.400 --> 29:08.460]  able to figure out how to assign a probability to a password guess. So if you can assign a
[29:08.460 --> 29:11.600]  probability to a password guess, I can probably incorporate it into a PCFG.
[29:15.320 --> 29:19.120]  So just kind of in the last little bit here, I really kind of want to highlight
[29:20.200 --> 29:25.920]  some additional tricks that are very useful when it comes to cracking passwords. So the first one
[29:25.920 --> 29:31.640]  here is this skip root flag in the PCFG. And basically what this does is disable omen guest
[29:31.640 --> 29:37.800]  generation. And that's not to say that omen guest generation is something that's bad to do.
[29:37.800 --> 29:41.600]  It certainly definitely helps increase the success of a password cracking session.
[29:42.000 --> 29:46.820]  But this is a way to paralyze your attack. So if you're having another system that's going ahead
[29:46.820 --> 29:51.860]  and really cranking through your brute force attack, you might want to go ahead and do all
[29:51.860 --> 29:56.680]  your brute force on that other system or on that other thread. And then run the PCFG guesser really
[29:56.680 --> 30:00.480]  just to focus on the word mangling rules instead. So in order to do that, all you need to do is
[30:00.480 --> 30:09.610]  just when you run it, just type in skip root. Another flag that's really kind of useful is
[30:09.610 --> 30:13.570]  the all lower flag. And what this means is it'll stop doing any sort of case mangling on the
[30:13.570 --> 30:19.950]  password guesses. So let me try to move my picture just a little bit here just to make it easier to
[30:19.950 --> 30:37.510]  read as I go back. I apologize. Okay. So a lot of times you may want to not go ahead and do case
[30:37.510 --> 30:43.590]  mangling inside of PCFG itself. And one reason might be that the hash that you're targeting is
[30:43.590 --> 30:48.650]  case insensitive like landman. That's not probably the best example, though, because if you're
[30:48.650 --> 30:52.010]  cracking landman hashes, you're not using PCFG in order to do that there. You're just going ahead
[30:52.010 --> 30:57.910]  and brute forcing that sucker and taking it out that way. Where it's more likely, though, is that
[30:58.710 --> 31:05.010]  case mangling is very distinct for how people do it there. So if someone does a certain type
[31:05.010 --> 31:09.850]  of case mangling, they have a tendency to keep on using that strategy for all their other passwords.
[31:10.010 --> 31:13.510]  So when you start doing things like targeted password cracking, you may not want to go ahead
[31:13.510 --> 31:17.450]  and just go ahead and do what everyone does. You want to really make a really specific case
[31:17.450 --> 31:23.210]  mangling for that particular individual. In that case, the better way to do this is that JohnRipper
[31:23.210 --> 31:30.970]  supports a really powerful feature called Pipe. So what the Pipe does is instead of just going
[31:30.970 --> 31:36.070]  taking the guesses in from standard input and running them as is, you can apply additional
[31:36.070 --> 31:41.010]  rules on top of that like you would do in a traditional password cracking dictionary type
[31:41.010 --> 31:47.250]  of attack. So you can specify your very specific case mangling rules inside JohnRipper's rule set
[31:47.250 --> 31:51.950]  and then pipe the lowercase password guesses right into JohnRipper and have JohnRipper
[31:51.950 --> 31:57.290]  capitalize it itself. And that can be very powerful when you have an idea of what type
[31:57.290 --> 32:12.600]  of case mangling you want to be able to target. So I of course moved it to the wrong portion here.
[32:14.220 --> 32:16.800]  Let me move my screen again here. I apologize.
[32:18.640 --> 32:22.620]  Some kind of improvements. As I mentioned, there was an amazing pull request that was
[32:23.060 --> 32:27.520]  submitted to me with a bunch of new features. I'm slowly incorporating them into the core,
[32:27.520 --> 32:32.120]  but I actually have the features available as their own kind of tool called segmenter.py.
[32:32.120 --> 32:37.520]  I mean, and I apologize if I mispronounce your name because I've only seen it written.
[32:37.520 --> 32:44.960]  But Chun-Wan Wang submitted this here and it really impresses me there.
[32:45.120 --> 32:48.840]  So probably the biggest feature I'm really excited about is leet-speak replacement. This has
[32:48.840 --> 32:53.460]  been a feature that has been kind of my white whale as far as implementing. And it's just
[32:53.900 --> 32:59.820]  every single time I've gone through it, it's just not been very effective. But that's
[32:59.820 --> 33:06.660]  currently incorporated into this tool he has called segmenter.py. That's been included in the
[33:06.660 --> 33:10.620]  repo that will go ahead and try to parse that information out. So I'm looking at getting that
[33:10.620 --> 33:14.600]  incorporated into my core trainer and getting that incorporated into password cracking sessions in
[33:14.600 --> 33:19.300]  order to be able to really target that there. He also improved some of the multi-word detection.
[33:19.300 --> 33:24.740]  So he made that better. And then he also has incorporated some new approaches into the
[33:24.740 --> 33:28.420]  password score, which is a different tool that you can go ahead and submit your password into
[33:28.420 --> 33:32.720]  the password score. And they'll tell you what the probability your password is,
[33:32.720 --> 33:38.800]  which is kind of nice as well. So all credit goes to him for this. I'm really impressed with this
[33:38.800 --> 33:45.020]  here. And if anyone else is looking at helping out too, I'm all about that. So thank you very
[33:45.020 --> 33:59.340]  much once again for that there. Okay, so let me move my screen around again here. Okay, so
[34:00.880 --> 34:04.680]  the next thing I currently want to talk about here is the compiled PCFG guesser. So I've been
[34:04.680 --> 34:11.080]  talking about the Python tool set all along right now. So the compiled PCFG guesser is a completely
[34:11.080 --> 34:15.860]  different, you know, fork. And as you kind of get the name there, it's still being written in Python,
[34:15.860 --> 34:20.280]  it's written in compiled C code. It's a little bit harder to get actually installed and running
[34:20.280 --> 34:23.220]  simply because when you start talking about compiling your code, you know, it runs great
[34:23.220 --> 34:30.100]  on my machine, but it has challenges elsewhere. I tried to go ahead and use a Hashcat build make
[34:30.100 --> 34:34.320]  file for this. So if you can build Hashcat on your computer, you have at least a better chance
[34:34.320 --> 34:40.780]  of being able to go ahead and get this running as well. But if you have problems, please, you know,
[34:40.780 --> 34:47.260]  reach out to me on the GitLab or GitHub site, and I can try to help you fix those there.
[34:47.900 --> 34:51.340]  So I will say that the trainer portion, it will always be in written in Python.
[34:51.340 --> 34:56.820]  I just like writing Python too much to change that over. So basically, you'll go ahead and
[34:56.820 --> 35:01.780]  create train rule sets with the Python trainer, but then copy them over to be used in the compiled
[35:02.560 --> 35:09.140]  version here. Also, the compiled version has a tendency to lag in features from the Python tool
[35:09.140 --> 35:15.800]  set, because once again, I like writing Python. I'm not the best C coder in the world. So basically,
[35:15.800 --> 35:19.980]  if I write a hello world program, it's going to have like five buffer overflows and, you know,
[35:20.080 --> 35:28.860]  a segfault. So take what you will there, but I'm making this available. If someone wants to
[35:28.860 --> 35:33.260]  write a better one, I'm totally open to that as well. But, you know, it doesn't have save restore,
[35:33.260 --> 35:38.040]  it doesn't have status outputs, and it has no omen guest iteration. So all that being said,
[35:38.040 --> 35:43.580]  you know, why bother with this here? And really, at the end of the day,
[35:45.140 --> 35:51.720]  the main reason is it's about 20 times faster than the Python tool set. And I've always heard
[35:51.720 --> 35:55.760]  that, you know, C code is faster than Python. But when I saw that, I was like, holy crap.
[35:55.760 --> 36:00.880]  So I will be up front. I'm actually, even with all these limitations, when I'm cracking passwords,
[36:00.880 --> 36:06.620]  I'm using the compiled C version now much, much more often than I'm using the Python one there.
[36:07.100 --> 36:12.740]  So because that 20 speed improvement is hard to beat for most password cracking sessions.
[36:16.120 --> 36:22.420]  So now I'm going to talk real quick about training passwords. So I've been talking about this a lot
[36:22.420 --> 36:25.400]  here. And there's a lot of different reasons why you want to go ahead and create a new password
[36:25.400 --> 36:32.380]  training set there. So language is a huge one. So you want to be able to train on passwords that
[36:32.380 --> 36:36.600]  are similar to the target that you're trying to target. And another big one is that corporate
[36:36.600 --> 36:41.720]  passwords are very, very different than you'll see from websites. And I'm sure you probably heard
[36:41.720 --> 36:49.060]  CoreLogic talk about this before, you know, yesterday. But that's something that, you know,
[36:49.060 --> 36:53.380]  is very evident. So if you're trying to target corporate passwords, you probably do want to go
[36:53.380 --> 36:57.240]  ahead and train on corporate passwords versus going ahead and training on passwords for some
[36:57.240 --> 37:06.280]  gaming website. So another reason to go ahead and train it, though, is if you're targeting a specific
[37:06.280 --> 37:12.400]  password creation policy, or you know which mangler rules your target prefers. So one way to be able
[37:12.400 --> 37:17.440]  to really target that there is to train only on passwords that match that training set there.
[37:17.440 --> 37:23.500]  And there's other things you can do, like the password rules, or the, you know, the grammar
[37:23.500 --> 37:28.380]  that I generate. I made sure that I didn't include anything like a CRC check or any of those
[37:28.380 --> 37:31.740]  sanity checks into it there. So you can actually open up the files themselves, they're just text
[37:31.740 --> 37:35.420]  files, and start editing the probabilities of different things in them by hand, too. So if you
[37:35.420 --> 37:38.960]  say, like, oh, this is one word I really want to go ahead and make it, like, highly probable,
[37:38.960 --> 37:42.220]  but I don't want to go have to train on a whole new training set. You can just go ahead and open
[37:42.220 --> 37:46.840]  it up, put that word in there, give it whatever probability you want, and that will just be read
[37:46.840 --> 37:52.160]  in and used in your password cracking session there. So the other reason to train on a password
[37:52.160 --> 37:55.360]  train set there is it generates a bunch of information and extracts a lot of information
[37:55.360 --> 38:00.440]  from that password set. So it's really useful to be able to analyze a new dump that you have
[38:00.440 --> 38:04.920]  accessible to you there. So for example, it'll pull out, like, common emails, it'll pull out
[38:04.920 --> 38:08.700]  dates and websites, and try to help you figure out where did this, you know, password data set
[38:08.700 --> 38:17.430]  come from there. So the next question, of course, is, you know, where do you go ahead and, you know,
[38:17.430 --> 38:22.350]  get these password data sets from? So there's a lot of challenges with this, too, because a lot
[38:22.350 --> 38:28.650]  of data sets are not optimal when it comes to training on. So I don't know if you know of
[38:28.650 --> 38:32.990]  hashes.org, but it's a really great site for being able to download all these, you know, dumps as
[38:32.990 --> 38:39.010]  they come out here. So for example, let's say you want to go ahead and train on this data set here.
[38:39.010 --> 38:43.050]  I'm not going to try and pronounce the name of that site, because I'm sure I'll just horribly,
[38:43.050 --> 38:48.270]  horribly mangle it there. But when I did some googling about this site here, it was a site for,
[38:49.230 --> 38:55.350]  you know, new college students trying to find a job in China. So that's kind of an interesting
[38:55.350 --> 38:59.910]  data set there that you might want to be able to use in order to train for correct passwords here.
[39:00.750 --> 39:04.010]  So if you download something from, like, hashes.org, the first and most important thing is select the
[39:04.010 --> 39:09.110]  plain text option to train your rule set on, because you don't want to include the hash
[39:09.810 --> 39:12.370]  part of your training set, because then you'll think it's part of the password and just it goes
[39:12.370 --> 39:18.050]  poorly there. So one other thing I really kind of want to highlight here, and this is a
[39:18.050 --> 39:24.670]  feature that I'm hopeful to be able to get added to the PCFG tool set, but I was informed by the
[39:24.670 --> 39:32.290]  owner of the site here that they actually do some additional things for encoding non-UTF-8 characters
[39:33.310 --> 39:40.130]  that my trainer will not fully parse correctly. So that's something I need to add in too,
[39:40.130 --> 39:46.310]  so that it goes ahead and, you know, uses the correct character encoding for non-English
[39:46.310 --> 39:51.250]  passwords. So I just want to put that warning out too for trying to train this on things like
[39:51.250 --> 39:58.900]  Mandarin. But one problem with a lot of these dumps here is the first one is they don't contain
[39:58.900 --> 40:05.180]  duplicate guesses. So duplicate guesses are really important when it comes to trying to figure out
[40:05.180 --> 40:09.140]  what the probability of password is, because if you don't have duplicate guesses, 1, 2, 3, 4, 5,
[40:09.140 --> 40:18.020]  6 looks like a very just random string. So that's useful, but I will say when you run longer
[40:18.020 --> 40:22.740]  cracking sessions with PCFGs, that lack of duplicates becomes less and less important,
[40:22.740 --> 40:26.640]  because you've already exhausted all the really probable password guesses. The one issue though
[40:26.640 --> 40:32.300]  is that the OMEN portion really does struggle without the duplicates. So you might not want
[40:32.300 --> 40:35.860]  to go ahead and enable OMEN guessing if you train on a data set that doesn't contain any duplicate
[40:35.860 --> 40:40.180]  guesses there. The other problem with these dumps is that they only contain the passwords that have
[40:40.180 --> 40:45.140]  been cracked. So basically you don't know or learn anything about the passwords that haven't been
[40:45.140 --> 40:50.180]  cracked. That's not a deal stopper, but just it's useful to keep that in mind there, that the
[40:50.680 --> 40:54.300]  cracked percentage is going to be very useful when it comes to figuring out how good a data set is
[40:54.300 --> 41:03.960]  in order to create a new rule set. So in order to train on a password data set there, really
[41:04.360 --> 41:08.480]  you just apply some program once again. You just give the name of the rule set that you want to be
[41:08.480 --> 41:12.660]  able to train it on, as well as the password data set that you want to train it on as well.
[41:13.240 --> 41:19.540]  And it'll go ahead and run in order to do all the parsing and stemming of the
[41:19.540 --> 41:25.080]  password data set here. So it will try to auto detect what that encoding is,
[41:25.780 --> 41:31.200]  but when in doubt, set it to be UTF-8 because the encoding really does matter quite a bit there.
[41:32.780 --> 41:36.000]  So the first pass it takes through the data set there, it learns all the character
[41:36.000 --> 41:39.840]  frequencies and base words for multi-word detection. So it actually makes a couple
[41:39.840 --> 41:43.100]  different passes through the same data set in order to learn more and more and more about it
[41:43.100 --> 41:48.640]  there. The second pass it goes through there, it'll do much of the real parsing of the password. So
[41:48.640 --> 41:52.260]  it figures out things like, you know, keyboard walks, alpha strings, you know, letters, how
[41:52.260 --> 41:56.160]  probable like the digits are and stuff like that. So most of the stuff you think of traditionally
[41:56.160 --> 41:59.040]  when you talk about, you know, what their probabilities of different, you know, things
[41:59.040 --> 42:04.460]  are, it does on the second run through. And it actually goes ahead and makes a whole third run
[42:04.460 --> 42:09.080]  through then to see about how effective things like, you know, OMEN would be for cracking
[42:09.080 --> 42:13.160]  passwords. So that kind of gets back to how OMEN generates the probability it's assigned with its
[42:13.160 --> 42:18.860]  different levels there. And so this takes a while. So if you're cracking, you're training it on a
[42:18.860 --> 42:23.260]  million passwords, you know, it's done in like a minute or two. If you're training it on a billion
[42:23.260 --> 42:31.000]  passwords, it takes significantly longer. And it has to keep all this data in memory. So if you're
[42:31.000 --> 42:37.000]  training on some of these really gigantic data sets there, it's just not going to work. So one
[42:37.000 --> 42:40.820]  thing you might want to do is just select a subset of that password set, you know, chosen
[42:40.820 --> 42:48.620]  randomly in order to train your rule set on instead. So after you're all done with that,
[42:48.620 --> 42:53.080]  though, it'll display statistics about the data set you just trained upon too, which are really
[42:53.080 --> 42:57.360]  kind of useful to figure out, you know, where it came from. So, you know, password lengths and stuff
[42:57.360 --> 43:02.280]  like that. But one thing that I've been kind of added that I found really useful is it'll display
[43:02.280 --> 43:07.540]  kind of like the top URLs, which are usually at the beginning, the top of them are like, you know,
[43:07.540 --> 43:15.440]  web, you know, email, email account information. But if you start getting down a little bit,
[43:15.440 --> 43:19.140]  you'll can actually see usually what the website is because people have a tendency to use the
[43:19.140 --> 43:24.840]  website in their password. I also highlight the dates that it finds it in there as well,
[43:24.840 --> 43:28.000]  because that's useful kind of trying to date when that password data set got leaked.
[43:28.220 --> 43:31.680]  Now, I want to kind of highlight that there's a long tail when it comes to the dates.
[43:32.280 --> 43:40.420]  I'm sorry. Because people, you know, create passwords before, you know, the password data
[43:40.420 --> 43:48.040]  set gets stolen. So, you'll see a lot of passwords for years before the data set actually gets
[43:48.040 --> 43:51.280]  down. But if you start kind of going down it a little bit, you can say, okay, that's probably
[43:51.280 --> 43:56.560]  about where the cutoff was for when this password data set was, you know, disclosed.
[43:59.490 --> 44:03.070]  So, kind of one last thing I really want to talk about real quick is that I am trying to get this
[44:03.070 --> 44:08.150]  to work with other cracking modes there. So, one of the, you know, really popular cracking modes
[44:08.150 --> 44:15.630]  used is called Prince. So, Prince basically takes a lot of different words and just combines them
[44:15.630 --> 44:21.690]  all together and makes lots of guesses based upon that. But one challenge with Prince is that it's
[44:21.690 --> 44:26.550]  very dependent upon the input word list that you give it to it there. Because the word list needs
[44:26.550 --> 44:31.770]  to have, you know, high quality words in it. But it also needs to have a level of kind of cruft in
[44:31.770 --> 44:35.650]  there too. Just because if you want to go ahead and, let's say, add the number one to the end of
[44:35.750 --> 44:42.070]  a word, you have one in your word list by itself. But the challenge is, the larger your word list
[44:42.070 --> 44:46.210]  is, the more words it's trying to combine. And then, you know, it starts to have issues there
[44:46.210 --> 44:51.330]  as well. So, we have all this probability information about how a password was generated.
[44:51.330 --> 44:56.650]  So, maybe we can go ahead and use this to create very bespoke word lists for like a Prince-style
[44:56.650 --> 45:02.770]  guessing session there. So, I created another tool called Princeling that basically just does that
[45:02.770 --> 45:14.560]  there. So, it goes... I'm sorry, my microphone just went out again there. But yeah, so it creates a very,
[45:14.560 --> 45:21.460]  you know, high quality word list there and does it automatically. Because one thing I like
[45:21.460 --> 45:26.280]  about Prince is it's the kind of the attack that I run when I want to goof off. So, like, you know,
[45:26.280 --> 45:28.920]  password cracking, sometimes it takes a lot of brain cells because you're kind of looking at
[45:28.920 --> 45:32.260]  how you're cracking. You're trying to optimize your cracking session. And Prince is like, I have
[45:32.260 --> 45:35.700]  no idea what I want to do. I want to go watch Tiger King on Netflix. Let's just go ahead and just
[45:35.700 --> 45:39.380]  launch this off and come back and see if it was successful. And Prince is usually actually quite
[45:39.380 --> 45:44.380]  successful. So, it's a pretty good tool to be able to use. And by anything that you can do in order
[45:44.380 --> 45:48.580]  to automate Prince even more, I'm all for, which is why I went ahead and created that.
[45:51.140 --> 45:55.920]  So, I'm going to go ahead and stop the live stream here. And hopefully, I'll be on Discord there in
[45:55.920 --> 45:58.940]  order to answer any questions that you have. I hope you enjoyed this. I hope this was helpful.
[45:59.200 --> 46:03.800]  And once again, you know, thank you for attending the Password Village here at DEF CON Safe Mode.
